Fitting curve to data

Within this notebook we do some data analytics on historical data to feed some real numbers into the model. Since we assume the consumer data to be resemble a sinus, due to the fact that demand is seasonal, we will focus on fitting data to this kind of curve.


In [2]:
import pandas as pd
import numpy as np
from scipy.optimize import leastsq
import pylab as plt

N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
data = 3.0*np.sin(t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise

guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)
guess_phase = 0

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean

# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*np.sin(t+x[1]) + x[2] - data
est_std, est_phase, est_mean = leastsq(optimize_func, [guess_std, guess_phase, guess_mean])[0]

# recreate the fitted curve using the optimized parameters
data_fit = est_std*np.sin(t+est_phase) + est_mean

plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()


import data for our model

This is data imported from statline CBS webportal.


In [3]:
importfile = 'CBS Statline Gas Usage.xlsx'
df = pd.read_excel(importfile, sheetname='Month', skiprows=1)
df.drop(['Onderwerpen_1', 'Onderwerpen_2', 'Perioden'], axis=1, inplace=True)

#df

In [4]:
# transpose
df = df.transpose()

In [5]:
# provide headers
new_header = df.iloc[0]
df = df[1:]
df.rename(columns = new_header, inplace=True)

In [6]:
#df.drop(['nan'], axis=0, inplace=True)
df


Out[6]:
Totaal verbruik Totaal via het hoofdtransportnet Elektriciteitscentrales Overige verbruikers Via regionale netten Verbruik bij de winning nan nan nan
2010 januari 6917 2638 1066 1572 4218 61 NaN NaN 14.6447
2010 februari 5834 2290 941 1349 3490 54 NaN NaN 13.9474
2010 maart 5075 2381 965 1416 2636 58 NaN NaN 11.9317
2010 april 3724 2056 841 1215 1614 54 NaN NaN 13.5475
2010 mei 3538 2027 742 1285 1458 53 NaN NaN 16.8825
2010 juni 2658 1844 673 1171 763 51 NaN NaN 19.275
2010 juli 2581 1927 698 1229 603 51 NaN NaN 19.4976
2010 augustus 2599 1837 668 1169 709 53 NaN NaN 18.1886
2010 september 3024 1936 729 1207 1042 46 NaN NaN 18.981
2010 oktober 4147 2345 983 1362 1742 60 NaN NaN 18.5762
2010 november 5017 2315 944 1371 2632 70 NaN NaN 19.4905
2010 december 6868 2499 994 1505 4301 68 NaN NaN 24.2864
2011 januari 5818 2304 804 1500 3453 61 NaN NaN 22.5125
2011 februari 5007 2029 728 1301 2921 57 NaN NaN 21.7842
2011 maart 4704 2129 740 1389 2515 60 NaN NaN 23.7587
2011 april 3136 1922 712 1210 1160 54 NaN NaN 22.9139
2011 mei 2981 2008 745 1263 917 56 NaN NaN 22.95
2011 juni 2549 1754 576 1178 742 53 NaN NaN 22.6295
2011 juli 2643 1838 595 1243 746 59 NaN NaN 21.5025
2011 augustus 2702 1933 646 1287 714 55 NaN NaN 21.9035
2011 september 2768 1881 649 1232 827 60 NaN NaN 23.1024
2011 oktober 3601 2076 748 1328 1468 57 NaN NaN 22.4943
2011 november 4479 2211 891 1320 2203 65 NaN NaN 23.7976
2011 december 4989 2143 962 1181 2781 65 NaN NaN 22.4048
2012 januari 5246 2076 758 1318 3104 66 NaN NaN 22.0125
2012 februari 5920 2174 833 1341 3682 64 NaN NaN 26.2215
2012 maart 3911 1773 513 1260 2069 69 NaN NaN 23.9432
2012 april 3600 1685 560 1125 1855 60 NaN NaN 24.8658
2012 mei 2747 1642 493 1149 1036 69 NaN NaN 24.1865
2012 juni 2507 1604 481 1123 839 64 NaN NaN 23.8768
... ... ... ... ... ... ... ... ... ...
2014 juli 2185 1605 501 1104 531 49 NaN NaN 16.5168
2014 augustus 2380 1670 512 1158 646 64 NaN NaN 17.4645
2014 september 2469 1738 623 1115 672 59 NaN NaN 20.851
2014 oktober 2876 1714 606 1108 1101 61 NaN NaN 21.2626
2014 november 3660 1693 556 1137 1900 67 NaN NaN 22.9568
2014 december 4798 1882 583 1299 2851 65 NaN NaN 22.53
2015 januari 5014 1836 524 1312 3112 66 NaN NaN 19.7505
2015 februari 4625 1804 607 1197 2758 63 NaN NaN 22.4884
2015 maart 4118 1657 456 1201 2395 66 NaN NaN 21.8682
2015 april 2940 1383 324 1059 1497 60 NaN NaN 22.0405
2015 mei 2367 1324 249 1075 979 64 NaN NaN 20.5611
2015 juni 1974 1222 284 938 696 56 NaN NaN 20.4868
2015 juli 2063 1448 403 1045 551 64 NaN NaN 20.8595
2015 augustus 1994 1378 375 1003 550 66 NaN NaN 19.5671
2015 september 2457 1523 416 1107 879 55 NaN NaN 19.1695
2015 oktober 3356 1747 667 1080 1548 61 NaN NaN 18.2268
2015 november 3582 1710 656 1054 1808 64 NaN NaN 17.154
2015 december 3712 1652 566 1086 1991 69 NaN NaN 15.77
2016 januari 4795 1809 664 1145 2915 71 NaN NaN 13.8942
2016 februari 4339 1608 592 1016 2668 63 NaN NaN 12.441
2016 maart 4331 1769 569 1200 2494 68 NaN NaN 12.2771
2016 april 3153 1498 436 1062 1592 63 NaN NaN 12.101
2016 mei 2284 1358 368 990 859 67 NaN NaN 12.7256
2016 juni 2064 1384 475 909 632 48 NaN NaN NaN
2016 juli 2086 1468 520 948 561 57 NaN NaN NaN
2016 augustus 2170 1552 576 976 559 59 NaN NaN NaN
2016 september 2274 1649 659 990 575 50 NaN NaN NaN
2016 oktober 3359 1800 697 1103 1494 65 NaN NaN NaN
2016 november 4301 1845 785 1060 2392 64 NaN NaN NaN
2016 december 4770 1977 794 1183 2732 61 NaN NaN NaN

84 rows × 9 columns


In [7]:
x = range(len(df.index))
df['Via regionale netten'].plot(figsize=(18,5))
plt.xticks(x, df.index, rotation='vertical')
plt.show()


now let fit different consumer groups


In [8]:
#b = self.base_demand
#m = self.max_demand
#y = b + m * (.5 * (1 + np.cos((x/6)*np.pi)))
#b = 603
#m = 3615

N = 84 # number of data points
t = np.linspace(0, 83, N)
#data = b + m*(.5 * (1 + np.cos((t/6)*np.pi))) + 100*np.random.randn(N) # create artificial data with noise
data = np.array(df['Via regionale netten'].values, dtype=np.float64)

guess_mean = np.mean(data)
guess_std = 2695.9075546 #2*np.std(data)/(2**0.5)
guess_phase = 0

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_mean + guess_std*(.5 * (1 + np.cos((t/6)*np.pi + guess_phase)))

# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*(.5 * (1 + np.cos((t/6)*np.pi+x[1]))) + x[2] - data
est_std, est_phase, est_mean = leastsq(optimize_func, [guess_std, guess_phase, guess_mean])[0]

# recreate the fitted curve using the optimized parameters
data_fit = est_mean + est_std*(.5 * (1 + np.cos((t/6)*np.pi + est_phase)))

plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
print('Via regionale netten')
print('max_demand: %s' %(est_std))
print('phase_shift: %s' %(est_phase))
print('base_demand: %s' %(est_mean))


Via regionale netten
max_demand: 2695.90755293
phase_shift: -0.0749085918325
base_demand: 380.296224651

In [9]:
#data = b + m*(.5 * (1 + np.cos((t/6)*np.pi))) + 100*np.random.randn(N) # create artificial data with noise
data = np.array(df['Elektriciteitscentrales'].values, dtype=np.float64)

guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)
guess_phase = 0

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_mean + guess_std*(.5 * (1 + np.cos((t/6)*np.pi + guess_phase)))

# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*(.5 * (1 + np.cos((t/6)*np.pi+x[1]))) + x[2] - data
est_std, est_phase, est_mean = leastsq(optimize_func, [guess_std, guess_phase, guess_mean])[0]

# recreate the fitted curve using the optimized parameters
data_fit = est_mean + est_std*(.5 * (1 + np.cos((t/6)*np.pi + est_phase)))

plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
print('Elektriciteitscentrales')
print('max_demand: %s' %(est_std))
print('phase_shift: %s' %(est_phase))
print('base_demand: %s' %(est_mean))


Elektriciteitscentrales
max_demand: 261.178751524
phase_shift: 0.423813583566
base_demand: 483.327290904

In [10]:
#data = b + m*(.5 * (1 + np.cos((t/6)*np.pi))) + 100*np.random.randn(N) # create artificial data with noise
data = np.array(df['Overige verbruikers'].values, dtype=np.float64)

guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)
guess_phase = 0
guess_saving = .997

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = (guess_mean + guess_std*(.5 * (1 + np.cos((t/6)*np.pi + guess_phase)))) #* np.power(guess_saving,t)

# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*(.5 * (1 + np.cos((t/6)*np.pi+x[1]))) + x[2] - data
est_std, est_phase, est_mean = leastsq(optimize_func, [guess_std, guess_phase, guess_mean])[0]

# recreate the fitted curve using the optimized parameters
data_fit = est_mean + est_std*(.5 * (1 + np.cos((t/6)*np.pi + est_phase)))

plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
print('Overige verbruikers')
print('max_demand: %s' %(est_std))
print('phase_shift: %s' %(est_phase))
print('base_demand: %s' %(est_mean))


Overige verbruikers
max_demand: 210.422418112
phase_shift: 0.0416396205406
base_demand: 1077.34831476

price forming

In order to estimate willingness to sell en willingness to buy we look at historical data over the past view years. We look at the DayAhead market at the TTF. Altough this data does not reflect real consumption necessarily


In [11]:
inputexcel = 'TTFDA.xlsx'
outputexcel = 'pythonoutput.xlsx'

price = pd.read_excel(inputexcel, sheetname='Sheet1', index_col=0)
quantity = pd.read_excel(inputexcel, sheetname='Sheet2', index_col=0)

price.index = pd.to_datetime(price.index, format="%d-%m-%y")
quantity.index = pd.to_datetime(quantity.index, format="%d-%m-%y")

pq = pd.concat([price, quantity], axis=1, join_axes=[price.index])
pqna = pq.dropna()

In [15]:
year = np.arange(2008,2017,1)

coefficientyear = []

for i in year:
    x= pqna['Volume'].sort_index().ix["%s"%i]
    y= pqna['Last'].sort_index().ix["%s"%i]
    #plot the trendline
    plt.plot(x,y,'o')
    # calc the trendline
    z = np.polyfit(x, y, 1)
    p = np.poly1d(z)
    plt.plot(x,p(x),"r--", label="%s"%i)
    plt.xlabel("Volume")
    plt.ylabel("Price Euro per MWH")
    plt.title('%s: y=%.10fx+(%.10f)'%(i,z[0],z[1]))
    # plt.savefig('%s.png' %i)
    plt.show()
    # the line equation:
    print("y=%.10fx+(%.10f)"%(z[0],z[1]))
    # save the variables in a list
    coefficientyear.append([i, z[0], z[1]])


y=0.0000000079x+(24.9204557360)
y=0.0000000212x+(11.3829297275)
y=0.0000000247x+(16.6889008406)
y=-0.0000000104x+(22.8865618305)
y=-0.0000000150x+(25.1761409718)
y=-0.0000000062x+(27.1329496082)
y=0.0000000898x+(19.8391566981)
y=-0.0000000008x+(19.8173818905)
y=0.0000000447x+(12.8282088664)

In [16]:
len(year)


Out[16]:
9

In [ ]: